Article 13839 (46 more) in alt.cd-rom:
Subject: Re: CD error correction, was:<None>
Summary: Oops.
References: <25svrr$krd@ebh.eb.ele.tue.nl> <1993Aug30.235816.2632@cmkrnl.com> <1993Sep1.150428.5543@bmerh85.bnr.ca>
Date: Thu, 2 Sep 93 15:14:02 GMT

Well, well.  Having received such an overwhelming "BZZZTT, yourself!" from
so many, I decided to look up the facts.  I'll start by responding to Ed
Hall's comments, and first off perform two net.rarities.  First let me
apologize to anyone I offended with my attitude and my misconceptions.
Second, let me provided the real scoop... with references, no less.  Read
on, if you have the time.

My exposure to CD error correction came from the CD-ROM side of the world.
I had to write C code to do the 2-D Reed Solomon error correction for CD-ROM
data since the Hitachi drives we were using did not do this in firmware.  The
bulk of what I had read dealt explicitly with CD-ROM error correction and said 
little about the implicit error correction built into the data encoding on
the disc.  If there's a grain of truth to my statement that CD-audio has no
error correction, it would be that - in the user-accessible data you can
read from a CD only - there is no provision for error correction, no additional
bytes available.  That's where my misconception lay.

Sometimes if you're wrong long enough, you really start to believe that
you're right.  That's what happened with me, although now I understand 
why I thought what I thought.  If I learned anything from grad school, at
a minimum it's (a) that I'm not always right, and (b) what to do if I
realize I'm not.  The first couple of replies to my posting just sort of 
irritated me.  The ones after that gave me that sinking feeling of impending
crow to be eaten.  In half-self-righteous indignation and a desire to find
out the objective truth, I went to the library for a few hours last night.
Since I no longer work for the company where I did work with CD-ROM, I no
longer have references on hand.  In addition, it's been about a year since
I worked with CD-ROM at all, and almost two since I worked with error
correction.

First reference, one that I had at my previous job: CD ROM - The New Papyrus
1986 Microsoft Pressr.  Essentially a collection of papers, including pp 73-83
by Andrew Hardwick entitiled "Error Correction Codes: Key to Perfect Data."
This chapter details the error correction used on CD-ROM, specifically the
2-dimension Reed-Solomon code that I was referring to earlier.  Looking
closer, I found that it actually did say that CD ROM and CD audio both use
Reed-Solomon (RS) coding, particularly CIRC, or Cross-Interleaved Reed-Solomon
Coding.  According to this book, this level provides a *Byte* Error Rate of
10E-9.  Jamie, if you want a fairly clear description of the CD-ROM layer of
error correction, this book would be a good choice.  He doesn't get technical
enough for you to write code to perform the algorithms, but it's detailed 
enough to get a good feel for what's going on.  One thing to be aware of,
though, Hardwick speaks almost solely in terms of Byte Error Rates, rather
than the Bit Error Rates (BER) that most other authors use.

QUIZ for the stochastically inclined:
        Given a Bit Error Rate of x, 0 < x < 1, what is the corresponding 
        Byte Error Rate?  Note:  BER refers to the fraction of bits that 
        are in error on average, typically expressed as 1E-9, for example.
        Also note that a BER of 1/8 does not mean a Byte ER of 1.  Two or
        more bit errors may occur in the same byte; some bytes may have 
        no errors.

A little information on the 2-D RS coding for CD-ROM... first, the 2340-byte
block is split into an even and odd subblock.  This automatically cuts in
half the length of a string of consecutive errors.  Next, the bytes in each
subblock are arranged into a 2-D array.  Instead of filling up one row, then
the next row, etc., the bytes are staggered diagonally, so that no two 
adjacent data bytes from the disc are on the same row or column.  This also
cuts down the effect of a string of errors.  Then the RS codes are applied
to each column and each row, alternatively.  The RS codes can correct any
single error and detect most multiple errors.  The beauty of this method is
that, even if you have multiple errors on a given row, they must fall on
different columns.  Thus, the row correction would fail, but the next round
of column correction would fix both.  This method falls apart when you have
errors, say, in a square:

                X - - - X   <- two errors on this row - can't correct
                - - - - -
                - - - - -
                - - - - -
                X - - - X   <- two errors on this row - can't correct

                ^       ^
                |_______|_____ two errors on these columns - can't correct

The syndromes generated by the RS codes, however, can give the correction
program an idea of where the errors are, and other methods can be used to
fix even such error patterns as this.  Apparently, that's part of the
proprietary information of Hardwick's company, because he really jumps
around the subject, giving little detail.  Nonetheless, the article says
that this method fails less than once in 1E4 times, giving a Byte ER of
1E-13, better than the 1E-12 required by the computer industry.  

This book had a sequel, CD ROM volume two, also by Microsoft Press, 1987.  In
chapter 3, pp 31-42, the data on a CD-ROM frame is broken into 12 bytes of
sync, 4 header bytes, 2048 user data bytes, 4 EDC bytes, 8 unused bytes, and
276 ECC bytes.  The author says that for a CD ROM with a BER of 1E-4 and
bursts of over 1000 bad bits, the correction can regenerate all but "one in
every 1E-12 [sic] bits."  Hmmm.... we can correct all but one of every
one-trillionth of a bit.  Go figure.

My next reference is the Essential Guide to CD-ROM, 1986 from Meckler 
Publishing.  In chapter 2, pp 13-32, Bert Gall gets deeper into the actual
encoding on the disc than the other books had touched.  In the terminology
he uses, a CD "frame" consists of 588 channel bits - meaning 588 bits on
the disc itself, where a 0 is a pit or land, and a 1 is the transition
between the two.  These are broken up as follows:

        Synch                           24        + 3   channel bits
        Control & display symbol        1 times  (14+3) channel bits
        Data symbols                    24 times (14+3) channel bits
        Error-correction symbols        8 times  (14+3) channel bits
                                        ---------------
                                        588 channel bits

The (14+3) refers to two things: every eight-bit byte is expanded into 14
bits so that the data is self-clocking.  The 3 bits are added between bytes
to keep a zero DC component (equal number of zeros and ones) in the data on
average.  The synch data is the only part of the disc that is not encoded
with these bits, although after the synch data, the 3 bits are inserted.
The conversion from 8 bits to 14 is referred to as EFM, or Eight-to-Fourteen
Modulation.  This is done by a simple table lookup, although the 256 words
from the possible 2**14 = 16384 were chosen to have special properties.

From the above breakdown, we see that for every 24 data symbols (bytes),
there are 8 error correction symbols (bytes).  These are arranged physically
on the disc as 12+4+12+4.  This gives a code efficiency of 3/4, meaning that
1/4 of the data on the disc is for error correction.  The first group of 4
corrects single errors in the 24 bytes and flags most multiple symbol errors.
the second group of 4 corrects up to 2 more symbol errors, given the positions
flagged by the first pass.  Any errors that this level finds but cannot fix
are flagged to the D/A conversion layer for muting, interpolation, and
concealment.  Note that some errors may slip by even this layer undetected.
"EDC/ECC correction ensures the high reliability of the CD-ROM 1E-5 - 1E-6
system.  The disc has a bit-error-rate (BER) of 1E-5 to 1E-6.  The CIRC
error-correction system reduces this to 1E-11 to 1E-12.  ECC and EDC finally
brings this to the rate of 1E-15 to 1E-16.  This is one of the best
corrected bit error rates obtainable today."  (p.20)

My final reference is Principles of Digital Audio, a SAMS publication by Ken
C. Pohlman, 1985 and 1989 2nd ed.  Chapter 8: Error Correction (pp185-228)
and Chapter 12: The Compact Disc (pp 321-373) contained very valuable
information.  As the title implies, this book deals specifically with digital
audio, in contrast to most of the other sources I've read.  This book reads
almost like a textbook, without homework problems at the end of the chapters.
It gives a background on error correction/detection theory, then goes into
some more specifics about the CIRC coding on CDs.  The data is first encoded
using a (28,24) code C2 resulting in 28 bytes from the original 24 data
bytes, then coded using C1 (32,28), adding four more bytes.  The 24 data
bytes consist of six pairs of 16-bit samples, but the six sampling periods
are scrambled chronologically so that they do not appear on the disc in the
order that they occur in time.  This helps to reduce the audible effect of
long burst errors, although it doesn't help the correction of errors any.

Pohlman gives burst-error lengths and their treatments as follows:
        Correctable:       up to 3874 bits
        Good Concealment:       13282 bits
        Marginal Concealment:   15495 bits

He also gives interesting bit rate values, saying that the channel bit rate
on the CD itself is 4.3218 Mbits/sec, of which only 1.41 Mbits/sec is audio
data after synch, error correction, and EFM demodulation.  Nearly 2/3 of the
bits on the surface of the CD are overhead.  A 1-hour audio CD contains
roughly 15.5 billion channel bits, around 5 billion audio data bits.

Regarding the guy drilling holes in his CDs to test the error correction, a
burst error of 3874 bits is about 2.5 mm on the pit track.  This much can be
fully corrected, even in an audio CD.  The limit for good concealment, 13282
bits, is about 7.7 mm.  Now we see why your 3 mm hole didn't affect the
sound, while your 6 mm hole did.  I find this amazing.

Pohlman also substantiates the raw BER of a CD as 1E-5 to 1E-6, although he
says, "In practice, because of the data density, even a mildly defective disc
can exhibit a much higher bit error rate."  Elsewhere he says, "If large
numbers of adjacent samples are flagged, the concealment circuitry performs
muting.  Using delay lines, a number of previous valid samples (perhaps 30)
are gradually attenuated....  Errors that have escaped the CIRC decoder
without being flagged are not detected bythe concealment circuitry, therefor
do not undergo concealment and might produce an audible click in the audio
reproduction. Not all CD players are alike in terms of error correction."  He
also gives a CD ROM BER of 1E-15 for mode 1 data.

As an aside tidbit, he says Navy studies show that a cruiser carries about
5.32 million pages of documents, about 36 tons.  The amount carried above the
main deck can affect the ship's stability.  In theory, the equivalent on a
CD-ROM would be 20 discs, weighing 280 grams.

So... if you've managed to read this far, thanks for your patience.  Again,
I'm sorry I jumped out with my misinformation, but it was an honest mistake.
I am actually glad to know the truth, painful as the process may have been,
and an evening's research in the library wouldn't kill most of us.  Now, does
anyone want to hear the truth on oversampling??? 8-O

-Bill Eason
-- 
All opinions and factoids expressed are my own or those I've collected, not
necessarily those of my employer.
Bill Eason, using lreid's account
Northern Telecom  Atlanta, GA   (on loan to BNR, Ottawa) 
